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Abstract 

In this paper a new similarity-based learning al- 
gorithm, inspired by string edit-distance ( Wag- 
ner and Fischer, 1974| ) , is applied to the problem 
of bootstrapping structure from scratch. The 
algorithm takes a corpus of unannotated sen- 
tences as input and returns a corpus of brack- 
eted sentences. The method works on pairs 
of unstructured sentences or sentences partially 
bracketed by the algorithm that have one or 
more words in common. It finds parts of sen- 
tences that are interchangeable (i.e. the parts 
of the sentences that are different in both sen- 
tences). These parts are taken as possible con- 
stituents of the same type. While this corre- 
sponds to the basic bootstrapping step of the al- 
gorithm, further structure may be learned from 
comparison with other (similar) sentences. 



The algorithm consists of two phases. The 
first phase, alignment learning, builds a large set 
of possible constituents by aligning sentences. 
The second phase, which is called bracket se- 
lection, selects the best constituents from this 
set. 

The rest of the paper is organized as fol- 
lows. We will start out by describing some pre- 
vious work in grammar induction. This is fol- 
lowed by a detailed description of the ABL al- 
gorithm, where three different implementations 
of ABL will be described, ABL:incr, ABL:leaf 
and ABL:branch. We will then discuss some of 
the properties of the ABL algorithms. First, we 
claim the ABL algorithms can generate recur- 
sive structures and then discuss problems with 
PP attachment. After that, some results of the 



algorithms applied to the ATIS corpus (Marcus 



Wp used this mPt.hnH for hnntW.rapping .tmr- et al„ 1993ft and to the OVIS corpus (|Bonnema 



hire from the flat sentences nf thc'Pcnn Tree- gt al., 19971 ) will be presented, followed by a 



bank ATIS corpus, and compared the resulting 
structured sentences to the structured sentences 
in the ATIS corpus. Similarly, the algorithm 
was tested on the OVIS corpus. We obtained 
86.04 % non-crossing brackets precision on the 
ATIS corpus and 89.39 % non-crossing brackets 
precision on the OVIS corpus. 

1 Introduction 

People seem to learn syntactic structure without 
great difficulty. Unfortunately, it is difficult to 
model this process and therefore, it is difficult 
to make a computer learn structure. 

Instead of actually modeling the human pro- 
cess of language learning, we propose a gram- 
mar learning algorithm and apply it to a corpus 
of natural language sentences. The algorithm 
should assign structure to the sentences which 
is similar to the structure people give to the 
sentences. 



description of future research on the ABL algo- 
rithm. 

2 Previous Work 

This section contains a brief overview of previ- 
ous work in grammar acquisition systems. Some 
differences between supervised and unsuper- 
vised acquisition methods will be discussed, fol- 
lowed by a discussion of several different meth- 
ods. We then relate ABL to some of these meth- 
ods. 

2.1 Supervised versus Unsupervised 

Language learning algorithms can be roughly 
divided into two groups, supervised and un- 
supervised learning algorithms based on the 
amount of information about the language they 
use. All learning algorithms use a teacher that 
gives examples of (unstructured) sentences in 
the language. In addition, some algorithms also 



use a critic. A critic may be asked if a cer- 
tain sentence (possibly including structure) is a 
valid sentence in the language. The algorithm 
can use a critic to validate hypotheses about the 
language. 1 Supervised language learning meth- 
ods use a teacher and a critic, whereas the unsu- 
pervised methods only use a teacher, ( powers, 
T997|) 



Supervised language learning methods typi- 
cally generate better results. These methods 
can tune their output, since they receive knowl- 



edge of the structure of the language (by initial- 1996|). 



Chater, 1992) and QRedington et al., 199^ ) mod- 
els are proposed that use distributional informa- 
tion to acquire syntactic categories. 

Another type of language learning system 
is based on the Minimum Description Length 
(MDL) principle. These systems compress the 
input corpus by finding the minimum descrip- 
tion needed to express the corpus. The com- 
pression results in a grammar that can describe 
the corpus. Examples of these systems can be 
found in ( Griinwald, 1994] ) and ( de Marcken 



isation or querying a critic). In contrast, unsu- 
pervised language learning methods do not re- 
ceive these structured sentences, so they do not 
know at all what the output should look like. 

However, it is worthwhile to investigate unsu- 
pervised language learning methods, since "the 
costs of annotation are prohibitively time and 
expertise intensive, and the resulting corpora 
may be too susceptible to restriction to a par- 



ticular domain, application, or genre". (Kehler 
and Stolcke. 19991) 



Wolff (19*82] ) describes a system that performs 
a heuristic search while creating and merg- 
ing symbols directed by an evaluation function. 
Similarly, Cook et al. (1976|) describe an algo- 
rithm that uses a cost function to direct search 
for a grammar. The grammar induction method 
found in ( Stolcke and Omohundro, 1994| ) merges 
elements of models using a Bayesian framework. 
Chen (1995| ) presents a Bayesian grammar in- 
duction method, which is followed by a post- 



pass using the inside-outside algorithm (Baker 



2.2 Language Learning Methods 



1979; Lari and Young, 1990), while Pereira and 
Schabcs (1992|) apply the inside-outside algo- 



In this section a short overview of several super- 
vised and unsupervised language learning meth- 
ods will be given. However, the focus will be 
on on unsupervised language learning methods, 
since we are interested in what results can be 
achieved using as little information as possible. 

There have been many different approaches 
to language learning, only some of the different 
systems will be described here briefly. This sec- 
tion is merely meant to relate the ABL system 
to other language learning methods; we do not 
claim to give a complete overview of this field. 

A lot of recent language learning methods 
are based on probabilistic or counting theo- 
ries. An example of this type of method is 
Memory-Based Learning (MBL) which keeps 
track of the distribution of contexts of words 
and assigns word types based on that informa- 



tion (Daclcmans, 1995|). [Magerman and Mar- 



cus (1990 ) describe a system that can find con- 
stituent boundaries using mutual information of 
n-grams within sentences, while in flFinch and 



an algorithm uses a treebank or structured 
corpus to initialise, it is said to be supervised. The struc- 
ture of the sentences in the corpus can be seen as the 
critic. 



rithm to a partially structured corpus. 

The final system mentioned here is 
Transformation-Based Learning ( |Brill, 1993| ) , 
which is also a supervised system. This 
system differs from others in that it is a 
non-probabilistic system that learns transfor- 
mations to improve a naive parse (for example 
a right branching parse). 

2.3 ABL in relation to other Methods 

ABL consists of two distinct phases, alignment 
learning and bracket selection. Both phases 
can be roughly compared to different language 
learning methods. 

The alignment learning step is a way of com- 
pressing the corpus. Similar techniques can be 
found in systems that are based on the MDL 
principle ( Griinwald, 1994| ). MDL systems com- 
press the corpus by looking for the minimal de- 
scription of the corpus. The MDL principle re- 
sults in grouping reoccurring parts of sentences 
yielding a reduction of the corpus. The align- 
ment learning step finds constituents based on 
the idea of interchangeability, which effectively 
compresses the corpus. 

The bracket selection phase selects con- 
stituents based on the probability of the pos- 



sible constituents. This is in some way similar 
to systems that use distributional information 
to select the most probable syntactic types as 
in ([Finch and Chater, 1992| ) or ( [Redington et 
al., 1998[) ^ On the other hand, ABL assigns a 
probability to the different constituents, which 
is similar to an SCFG (|Baker, 19791 ). 



3 Algorithm 

In this section we describe an unsupervised 
grammar learning algorithm that learns using a 
corpus of plain sentences, so neither pre-tagged 
nor pre-labelled or bracketed sentences are used. 
The output of the algorithm is a labelled, brack- 
eted version of the input corpus. Although 
the algorithm does not generate a (context-free) 
grammar, it is trivial to deduce one from the 
output treebank. 

The algorithm consists of two distinct phases: 
alignment learning and bracket selection. Align- 
ment learning finds possible constituents by 
aligning sentences from the corpus. The bracket 
selection phase selects constituents from the 
possibly overlapping constituents that were 
found in the alignment learning phase. 

Both phases will be described in more detail 
in the next two sections. 

3.1 Alignment Learning 

The alignment learning phase is based on Har- 



ris's idea of interchangeability (Harris, 1951) 
that states that constituents of the same type 
can be replaced by each other. This means that 
for example in the sentence What is a dual car- 
rier the noun phrase a dual carrier may be 
replaced by another noun phrase.f] Replacing 
the noun phrase with the payload of an African 
Swallow yields the sentences What is the pay- 
load of an African Swallow, which is another 
well-formed sentence. 

ABL uses the replacement feature to find con- 
stituents by reversing the idea. Instead of re- 
placing parts of sentences that have the same 
type, this idea is used to find constituents of 
the same type by looking for parts of sentences 
that can be replaced by each other. In the ex- 
ample this means the algorithm looks for parts 
as a dual carrier and the payload of an African 



Swallow, which then are remembered as possi- 
ble constituents. 

Finding replaceable parts of sentences is done 
by finding parts that are identical in two sen- 
tences and parts that are distinct. The distinct 
parts of the sentences are parts that can be in- 
terchanged. See for example the sentences in 
figure The part What is is identical in both 
sentences, while the noun phrases are distinct 
parts. ABL now assumes the distinct parts of 
the sentences are constituents of the same type. 

What is a dual carrier 

What is the payload of an African Swallow 

What is (a dual carrier) x[ 

What is (the payload of an African Swallow) x 1 



Figure 1: Bootstrapping structure 

An instance of the string edit distance algo- 
rithm ( |Wagner and Fischer, 1974 ) that finds the 
longest common subsequences in two sentences 
is used. These longest common subsequences 
are parts that are identical in both sentences. 
The distinct parts of the sentences are exactly 
these parts of the sentences that are not part 
of the longest common subsequences. In fig- 
ure |l|, What is is the longest common subse- 
quence (and the only one), while a dual carrier 
and the payload of an African Swallow are the 
remaining (distinct) parts of the sentences. 

The distinct parts of the sentences, found 
by the string edit distance algorithm, are then 
grouped and labelled. Respective distinct parts 
receive the same non-terminal. Since the al- 
gorithm does not know any linguistically mo- 
tivated names for the non-terminals, it assigns 
names X\, X2, ... to the different constituents. 

More structure is learned when aligning more 
than two sentences. Each new sentence is com- 



For each sentence si in the corpus: 

For every other sentence S2 in the corpus: 
Align s\ to S2 

Find the identical and distinct parts 

between s± and S2 
Assign non-terminals to the constituents 

(i.e. distinct parts of si and S2) 



2 Most example sentences used in this paper can be 
found in the ATIS corpus. 



Figure 2: Alignment learning algorithm 



pared to all (partially structured) sentences in 
the partially structured corpus. An overview of 
the algorithm can be found in figure ^. 

Sentences are always aligned without looking 
at the structure that is already known; all newly 
learned structure is added to the old structure, 
which may yield overlapping constituents. 

(Book Delta 128)x 1 from Dallas to Boston 
( Give me all flights )x x from Dallas to Boston 
Give me ( all flights from Dallas to Boston )x 2 
Give me (information on reservations) x 2 

Figure 3: Overlapping constituents 

In figure || two overlapping constituents are 
learned on the sentence Give me all flights from 
Dallas to Boston. Constituent X\ is learned 
when aligning to the sentence Book Delta 128 
from Dallas to Boston and the other constituent 
(X2) is learned when aligning to the sentence 
Give me information on reservations. The 
bracket selection phase selects brackets until no 
more overlapping brackets are found. 

An unstructured sentence is sometimes 
aligned to a partially structured sentence (which 
was already in the partially structured corpus). 
Aligning these sentences might yield a con- 
stituent that was already present in the par- 
tially structured sentence. The new constituent 
in the unstructured sentence then receives the 
same non-terminal as the constituent in the par- 
tially structured sentence. An example of this 
can be found in figure Sentences 1 and 2 are 
compared, resulting in sentences 3 and 4. 

1 What does (AP57 restriction) x 1 mean 

2 What does aircraft code D8S mean 

3 What does (AP57 restriction) x 1 mean 

4 What does (aircraft code D8S)x 1 mean 



Figure 4: Learning with a partially structured 
sentence and an unstructured sentence 

It may even be the case that two partially 
structured sentences are aligned. This occurs 
when a new sentence has been aligned to a sen- 
tence (and has received some structure) and is 
then aligned to another partially structured sen- 
tence in memory. If this combination of sen- 
tences yields a constituent with two distinct 
non-terminals (a different one in each sentence), 



the two non-terminals are merged. All occur- 
rences of these non-terminals are updated in the 
corpus. In figure || sentences 1 and 2 are com- 
pared, resulting in sentences 3 and 4. 

1 Explain the (meal code)x x 

2 Explain the (restriction AP)x 2 

3 Explain the (meal code)x 3 

4 Explain the (restriction AP)x 3 

Figure 5: Learning with two partially struc- 
tured sentences 

Merging non-terminals (as shown in figure |] 
reduce the number of different non-terminals. 
By merging non-terminals, we assume that con- 
stituents in a certain context can only have one 



type. In section 6.3 we discuss the consequences 



and propose a method that loosens this assump- 
tion. 

3.2 Bracket Selection 

The alignment learning phase may generate un- 
wanted overlapping constituents as can be seen 
in figure ||[ Since we assume the underly- 
ing grammar of the corpus is context-free and 
we want to know the most appropriate disam- 
biguated structure of the sentences of the cor- 
pus, overlapping constituents have to be elimi- 
nated. The bracket selection phase does exactly 
this. 

Three different approaches to the selection of 
constituents have been implemented. Note that 
the selected constituents are kept (i.e. they are 
not removed). 

ABL:incr Assume the constituent learned ear- 
lier is correct. This means that when new 
constituents overlap with older ones, they 
are ignored. 

ABL:leaf Constituents are selected based on 
their probability. The system computes 
the probability of a constituent by count- 
ing the number of times the sequence of 
words in the constituent occurs as a con- 
stituent, normalized by the total number 
of constituents. The probability of a con- 
stituent c can be computed as follows: 



Pi 



leaf \ 



\c' £ C : yield(d) = yield(c)\ 
\C\ 



where C is the entire set of constituents. 



ABL:branch This method is similar to the 
ABLdeaf method. The probability of a 
constituent is now computed by counting 
the number of times the sequence of words 
in a constituent together with its root non- 
terminal occur, normalized by the num- 
ber of constituents with that root non- 
terminal: 



Pbranch(c\rOOt(c) = r) = 

\d E C : yielded) = yield(c) A root{c') = r\ 
\c" G C : root(c") = r\ 

The ABLdncr method may be applied dur- 
ing the alignment learning phase. When a con- 
stituent is found that overlaps with an exist- 
ing constituent, the new constituent will not be 
stored. ABLdeaf and ABL:branch bracket se- 
lection methods are applied after the alignment 
learning phase, since more specific constituent 
counts are available then. 

The probabilistic methods, ABLdeaf and 
ABL:branch, both compute the probability of 
constituents. Since more than just two con- 
stituents can overlap, the methods need to con- 
sider the probability of all possible combina- 
tions of constituents, which is the product of 
the probabilities of the separate constituents as 
in SCFGs (cf. ( [Booth ,19691)). Viter bi style al- 
gorithm optimization ( Viterbi, 1967| ) is used to 
efficiently select the best combination of con- 
stituents. 

Computing the probability of a combina- 
tion of constituents by taking the product of 
the probabilities leads to a "thrashing" effect. 
Since the product of two probabilities is al- 
ways smaller than or equal to the two single 
probabilities, the system has a preference to 
single constituents over a combination of con- 
stituents. Therefore, the geometric mean is 
used to compute the probability of a combi- 
nation of constituents instead ( |Caraballo and 
Charniak, 199£| ).P1 When more combinations of 
constituents have the same probability, one is 
chosen at random. 



4 Discussion 



1 The man saw the girl with his squinting eyes 

2 The man saw the girl with the bikini 

1 i 

3 The man saw the girl with the binoculars 

Figure 7: PP attachment 

4.1 Recursion 

All ABL algorithms generate recursive struc- 
tures. Figure |6| shows some real examples of 
recursive structures generated by the ABL sys- 
tems compared to their structure in the ATIS 
corpus. 

In the first example ABL finds a recursive 
structure, but there is no recursion in the origi- 
nal sentence. The context of the constituents 
is quite similar (both follow the word "on"). 
The second example contains recursive struc- 
tures that are very alike. The original cor- 
pus contains two recursive noun phrases, while 
the learned structure describes recursive noun 
phrases without the determiner. The last ex- 
ample contains recursive noun phrases with con- 
junction. Again similar structures are found in 
the learned and original corpora. 

Intuitively, a recursive structure is formed 
first by building the constituents that form the 
structure of the recursion but with different root 
non-terminals. Now the non-terminals need to 
be merged. This happens when two partially 
structured sentences are compared to each other 
yielding a constituent that already existed in 
both sentences with the non-terminals present 
in the "recursive" structure (see figure ||). The 
non-terminals are then merged, resulting in a 
recursive structure. 

4.2 PP attachment 

Most unsupervised grammar learning systems 
have difficulty regarding prepositional phrase 
attachments. In figure |7] the first two sentences 
seem similar, but the prepositional phrase with 
his squinting eyes indicates the man saw the girl 
using his eyes, while with the bikini determines 
the girl to have the bikini, since it is difficult 
to see using a bikini.^ On the other hand, the 
third sentence in figure ^ is ambiguous. 



Cl. 



The geometric mean of a set of constituents 

. . . ,cn is p(ci a ... a o„) = vn r=i p ( c o 



4 The phrase with his squinting eyes could also mean 
that the girl has squinting eyes, just like the man, but 
this reading is less likely. 



learned Book reservations for five from Dallas to Baltimore on (flight 314 on (May 12th)x 15 )xi 5 
original Book reservations for five from Dallas to Baltimore on (flight 314)np on (May 12th)wp 
learned What is the (name of the (airport in Boston)x ls )x 18 
original What is (the name of (the airport in Boston) np)np 
learned Explain classes QW and (QX and (Y)x 52 )x 5 2 
original Explain classes ((QW)np and (QX)np and (Y)^p)np 

Figure 6: Recursion in the ATIS corpus 



To solve the problem of PP attachment, sev- 
eral types of information may be needed. The 
PP attachment in the first sentence can be 
solved using syntactic information only, since 
his clearly indicates the man is looking. How- 
ever, the second sentence can only be solved 
using semantic information. Without knowing 
that a bikini does not help enhancing sight, it is 
difficult if not impossible to attach the preposi- 
tional phrase to the correct constituent. 

The third sentence shows that sometimes dis- 
course information is needed to solve attach- 
ment. The prepositional phrase with the binoc- 
ulars should be attached to saw or the girl de- 
pending on the who has the binoculars. 

Since the ABL system builds structures based 
on a context-free grammar and does not use 
any other information than the input sentences, 
ABL might seem unable to solve PP attach- 
ments correctly. However, the bracket selection 
phase might find a difference in distribution be- 
tween the girl with his squinting eyes and the 
girl with the bikini. This difference in distribu- 
tion might resolve in the different PP attach- 
ments. 

5 Results 

This section describes the results of apply- 
ing the three ABL algorithms and two base- 
line systems to the Penn Treebank ATIS (Air 
Travel Information System) corpus ( |Marcus~et 
al., 1993|) a nd to the OVIS corpus ( |Bonnema 
et al., 1997D -PI First the test environment is de- 
scribed, followed by a discussion of the test re- 
sults. 

5.1 Test Environment 

The three ABL algorithms, ABL:incr, ABLdeaf 
and ABL:branch have been applied to the ATIS 
and OVIS corpus. The ATIS corpus contains 

5 OVIS (Openbaar Vervoer Informatie Systeem) 
stands for Public Transport Information System. 



716 sentences and 11,777 constituents. The 
larger OVIS corpus contains 10,000 sentences. 
When single word sentences are removed, this 
results in a corpus of 6,797 sentences containing 
48,562 constituents. 

The sentences of the corpora are stripped of 
their structure. The resulting plain sentences 
are used in the learning algorithms and the re- 
sulting structured sentences are then compared 
to the structure of the sentences in the original 
corpora. 

To be able to compare the results, we have 
also computed the results of two baseline sys- 
tems, a left-branching and a right-branching 
system on both corpora. 

The results of the ABL:incr system depend 
on the order of the sentences in the corpus and 
the two probabilistic ABL systems sometimes 
choose constituents at random (i.e. when more 
combinations of constituents have the same 
probability). Therefore, the ABL:incr system is 
applied to ten differently ordered versions of the 
corpora. Likewise, we applied the probabilistic 
ABL systems ten times to the corpora to ac- 
count for the random selection of constituents. 
The mean results are shown in table || and the 
standard deviations are shown in brackets. 

Three different metrics are used to compare 
the results of the different algorithms: 

NCBP Non-Crossing Brackets Precision: the 
percentage of learned constituents that do 
not overlap any constituents in the original 
corpus. 

NCBR Non-Crossing Brackets Recall: the 
percentage of original constituents that do 
not overlap any constituents in the learned 
corpus. 

ZCS Zero-Crossing Sentences: the percentage 
of sentences that do not have any overlap- 
ping constituents. 



NCBP 



NCBR 



J2i \NonLaBr(Oi) \ - \Cross(NonLaBr(Oi), NonLaBr(Ti))\ 
J2i\NonLaBr(Oi)\ 

Y,i \NonLaBr{Ti)\ - \Cross(NonLaBr(Ti), NonLaBr(Oi))\ 



zcs 



\N<mLaBr(Ti)\ 
E i Cross(O l ,T l )=0 
\TEST\ 



NonLaBr(T) denotes the set of non-labelled brackets of the non-terminal nodes of T. 
Cross(U, V) denotes the subset of brackets from U that cross at least one bracket in V. 
Oi represents a tree in the learned corpus. 
Tj is a tree in TEST, the original corpus. flSima'an, 1999| ) 

Figure 8: Formulas of the different metrics 



The formulas describing the metrics can be 
found in figure ||. 

5.2 Test Results 

The results of applying the ABL algorithms and 
the baseline systems on the ATIS and OVIS cor- 
pora can be found in table |l]. The ATIS corpus 
is predominantly right branching as can be seen 
in the results, while the OVIS corpus is right 
branching to a lesser degree. 

The ABL:branch method performs signifi- 
cantly better on all metrics on both the ATIS 
and OVIS corpora compared to the other ABL 
methods. The only deviating ABL result is the 
ZCS of the ABL:incr system in the OVIS cor- 
pus. 

It is interesting to see that the ABL:incr sys- 
tem outperforms ABL:leaf. In the ABL:incr 
system, incorrectly learned constituents will 
never be corrected. The idea behind the prob- 
abilistic methods (including ABLdeaf) is that 
incorrect constituents will be corrected because 
they will receive lower probabilities. Appar- 
ently, the statistics used in the ABLdeaf method 
do not provide enough information to make a 
correct bracket selection, whereas ABL:branch 
does. 

The standard deviation of the results of the 
ABL:incr system is quite large, which indicates 
that the order of the sentences in the corpus is 
important. On the other hand, selecting ran- 
dom constituents (when multiple constituents 
have the same probability) in the probabilistic 



ABL systems yields less varying results. When 
a more specific probabilistic method is used 
(ABL:branch), the variance in results is almost 
zero. 

The right branching system outperforms ABL 
systems on several metrics (the ABL:branch 
method almost keeps up however). This could 
be expected, since the ATIS is predominantly 
right branching (the OVIS corpus to a lesser 
degree). Therefore, the right branching sys- 
tem should perform well. The ABL systems do 
not have a directional branching method built- 
in; furthermore, the ABL systems have no ad- 
justable parameters that can be used to tune 
the algorithms to the specific corpora. This 
means that the right branching system does not 
perform well on a corpus of a predominantly 
left branching language (for example Japanese) , 
while we expect that ABL does. 

It is difficult to compare the results of the 
ABL model against other methods, since often 
different corpora or metrics are used. The meth- 
ods described by Pereira and Schabes (199^ ) 
comes reasonably close to ours. The unsuper- 
vised method learns structure on plain sentences 
from the ATIS corpus resulting in 37.35 % 
non-crossing brackets precision, while the un- 
supervised ABL significantly outperforms this 
method, reaching 86.04 % precision. Only their 
supervised version results in a slightly higher 
precision of 90.36 %. 





Results ATIS corpus 


Results OVIS corpus 




NCBP 


NCBR 


ZCS 


NCBP 


NCBR 


ZCS 


left 

right 

ABL:incr 

ABLdeaf 

ABL:branch 


32.60 
82.70 

82.55 (0.80) 
82.20 (0.30) 
86.04 (0.10) 


76.82 
92.91 

82.98 (0.78) 
82.65 (0.29) 
87.11 (0.09) 


1.12 

38.83 

17.15 (1.17) 
21.05 (0.76) 
29.01 (0.00) 


51.23 
75.85 

88.69 (1.11) 

85.70 (0.01) 
89.39 (0.00) 


73.17 
86.66 

83.90 (1.61) 
79.96 (0.02) 
84.90 (0.00) 


25.22 
48.08 

45.13 (4.12) 
30.87 (0.07) 
42.05 (0.02) 



Table 1: Results ATIS and OVIS corpora 



6 Future Work 

Although the overall result of ABL algorithm is 
slightly disappointing, some of the results of the 
different ABL algorithms are encouraging. We 
expect future extensions of the ABL system to 
improve performance. 

This section contains some future extensions 
to the ABL implementations described in this 
paper. These extensions solve certain problems 
the current ABL systems have. First, we take 
a look at different alignment schemes, that may 
be used as an alternative to the string edit dis- 
tance algorithm. Then an alternative proba- 
bilistic model for bracket selection is considered. 
Finally, we discuss two methods that generate 
more possible constituents by adding context or 
weakening exact match. 

6.1 Different Alignment Schemes 

The edit distance algorithm that is used to 
find identical parts in sentences sometimes 
finds alignments that generate unintended con- 
stituents. If we consider the sentences 1 and 2 in 
figure ||, the edit distance aligns these sentences 
as in sentences 3 and 4. Unfortunately, the 
alignment in sentences 3 and 4 generate unin- 
tended constituents. The most preferred align- 
ment is shown in sentences 7 and 8, since San 
Francisco and Dallas are grouped and receive 
the same type. 

This problem occurs every time the algorithm 
aligns words that are "too far apart". The rel- 
ative distance between the two San Franciscos 
in the two sentences is much larger than the 
relative distance between the word to in both 
sentences. 

There are two possible solutions to this prob- 
lem. The first solution is to change the cost 
function of the edit distance algorithm to let it 



1 from San Francisco to Dallas 

2 from Dallas to San Francisco 

3 from Qx x San Francisco (to Dallas) x 2 

4 from (Dallas to)x 1 San Francisco ()x 2 

5 from (San Francisco to)x 3 Dallas ()x A 

6 from ()x z Dallas (to San Francisco) x 4 

7 from (San Francisco) x 5 to (Dallas) x 6 

8 from (Dallas) x 5 to (San Francisco) 'x 6 

Figure 9: Unintended constituents 

select the better alignment. Another solution is 
to simply generate all possible alignments (us- 
ing a completely different alignment algorithm) 
and let the bracket selection phase of the algo- 
rithm select the best alignment. 

A better cost function should be biased to- 
wards alignments with a small relative distance. 
This can be accomplished by letting the cost 
function return a high cost when the difference 
of the relative offsets of the words is large. The 
relative distance between the two San Fran- 
ciscos in sentences 3 and 4 in figure ^ is larger 
compared to the relative distance between the 
two tos in sentences 7 and 8. Therefore the total 
edit cost of sentences 7 and 8 will be less than 
the edit cost of sentences 3 and 4 or sentences 5 
and 6. 

When all possible alignments are generated, 
there is a large probability the intended align- 
ment will also be found. Unfortunately, it is not 
known which of the possible alignments is the 
intended. The bracket selection phase should 
select the best constituents from all possible 
constituents that are generated by the align- 
ment learning phase. When the probabilistic 
bracket selection methods are used, we just have 
to assume that the intended alignment gen- 



erates more probable constituents, or equiva- 
lently that the intended alignment contains con- 
stituents that occur more often in the corpus 
than the unintended constituents. 

6.2 Alternative Probabilistic Models 

The results of the application of the algorithm 
on the ATIS corpus show that the probabilistic 
methods generate less fluctuating results (i.e. 
they have a smaller standard deviation) than 
the non-probabilistic method. The results of the 
ABL:branch method show less deviation than 
the results of the ABLdeaf method. Further- 
more, the ABL:branch method generates the 
best results. These results indicate that a more 
specific probabilistic method works better in 
bracket selection. 

As future research, a DOP-like probabilistic 
method ( Bod, 1998| ) of bracket selection will be 
implemented. In this approach, all possible con- 
stituents are broken into fragments (cf. elemen- 
tary subtrees in DOP). The probability of such 
a fragment / is: 



P(f\root(f) 



l/l 



Y^f':root(f')=r \f'\ 



When combining these fragments, the struc- 
ture of constituent c can be generated. This 
can be seen as a derivation of the structure of 
the constituent. The probability of a derivation 



fn is: 
P{h 



fn)=U P (fi 



Usually, there is more than one way of deriving 
the structure of the constituent. The probabil- 
ity of the constituent is now the combination of 
all derivations that yield the structure of c: 



P(c) 



d derives c 



The probabilistic model of DOP does not only 
use the counts of terminals or non-terminals 
like in the ABLdeaf and ABL: branch meth- 
ods, but also uses the internal structure of con- 
stituents. This yields a much more specific 
stochastic model. 

The ABL:branch bracket selection method is 
comparable to the probabilistic model of SCFGs 
(cf. ( Booth, 1969|) ), where probabilities of 



context-free grammar rules (terminals and their 
root non-terminal) are used. Since DOP clearly 
outperforms SCFGs (as shown in ( Bod, 1995| )), 
a DOP model in bracket selection is expected 
to increase performance. 

6.3 Adding Context 

Some problematic cases exist where ABL might 
seem unable to learn the correct syntactic type. 
Consider sentences as in figure 10. The ABL 



algorithm finds that morning and nonstop are 
of the same type, since the rest of the two sen- 
tences is identical. However, morning is tagged 
as NN (a noun) and nonstop as JJ (an adjec- 
tive) . 

Show me the (morning) x 1 flights 
Show me the (nonstop) x 1 flights 

Figure 10: Inconsistent syntactic types 

On the other hand, one might argue these 
words are of the same type, exactly because 
they occur in the same context. Both words 
might be seen as some sort of adjective phrase. 

This is a difference between syntactic type 
and functional type. Morning and nonstop have 
a different syntactic type, a noun and an ad- 
jective respectively, but both modify the noun 
flights, i.e. they have the same functional type. 
ABL finds the functional type, while the words 
are tagged according to their syntactic type. 

Since the overall use of the two words differs 
greatly, they occur in different contexts. Morn- 
ing may in general occur in places where nouns 
or noun phrases belong, while nonstop may not. 
This discrepancy can be used to differentiate 
between the two words. Instead of giving the 
words only one type, they get two: one type 
describes the context (i.e. the functional type), 
which is the same for both words and the other 
type describes the syntactic type, which is dif- 
ferent for both words. 

The distribution of syntactic types combined 
with functional types can be used to find words 
that belong to the same syntactic type. Morn- 
ing and nonstop have the same functional type 
but different syntactic types, since morning nor- 
mally occurs in other contexts than nonstop. 

Since the functional type should describe the 
context of a constituent, merging of constituents 
as in figure || and figure || should only ap- 



ply to functional types. Merging of syntactic 
types is only done when the distribution of the 
constituents is similar enough. This effectively 
loosens the assumption that constituents in a 
certain context have the same (syntactic and 
functional) type. 

6.4 Weakening Exact Match 

The algorithm described in this paper is unable 
to learn any structure when two completely dif- 
ferent sentences are compared. (This is not an 
insurmountable problem, since other sentences 
can be used to learn structure on the two sen- 
tences.) 

The algorithms described so far all try to 
align using exactly matching words. Sometimes 
this is a too strong demand; it is enough to 
match similar words. Imagine sentences 1 and 
2 in figure 11, which are completely distinct. 
The standard ABL learning methods would con- 
clude both are sentences, but no more structure 
will be found. When the algorithm knows Book 
and List are words of the same type (represent- 
ing verbs), it would find the structures in sen- 
tences 3 and 4 where the type X\ represents a 
noun phrase. 

1 Book Delta flight 430 

2 List the cost for limousines 

3 Book (Delta flight 430) Xl 

4 List (the cost for limousines) Xi 

Figure 11: Sentences without identical words 

An obvious way of implementing this is by us- 
ing equivalence classes. (See for example (Red- 



ington et al., 1998[) .) Words that are closely 



related are grouped together in the same class. 
Words in the same equivalence class are similar 
enough to be aligned. 

A big advantage of equivalence classes is 
that they can be learned in an unsupervised 
way. Even when the algorithm is extended with 
equivalence classes, it still does not need to be 
initialised with structured training data. 

7 Conclusion 

In this paper a new language learning algorithm 
based on aligning sentences is introduced. It 
uses distinctions between sentences to find pos- 
sible constituents during the alignment learn- 
ing phase and selects the most probable con- 



stituents afterwards in the bracket selection 
phase. 

Three instances of the algorithm have been 
applied to the ATIS corpus (716 sentences) 
and the OVIS corpus (6,797 sentences). The 
alignment learning phase in the three systems 
is the same, but the bracket selection phase 
differs. ABLdncr assumes earlier constituents 
learned are correct, ABLdeaf and ABL:branch 
select brackets based on their probability. The 
ABL:branch method, which uses the most spe- 
cific stochastic model, performs best. 

The ABL algorithm is an unsupervised gram- 
mar learning and bootstrapping system. It uses 
plain sentences to learn structure, so neither 
pre-tagged, pre-labelled nor pre-bracketed sen- 
tences are used. There is no need to train the 
system on a structured corpus and the system 
does not use any parameters that need to be 
set. 

Possible constituents are found by looking at 
an unlimited context. The algorithm does not 
use a fixed window size. Alignments (and thus 
constituents) of arbitrarily large size may be 
considered. 

Furthermore, the ABL algorithm is able to 
learn recursion from a finite set of sentences. 
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